On this document, I’ve included the results from the initial exploration into the different model outputs, ranking of covariate influence, performance metrics, and prediction maps. This first set of models only includes extracted covariate data at a daily temporal resolution, but I am also considering exploring models that include covariate data at a seasonal or annual temporal resolution. The pseudo absences used in these models were generated using background sampling approaches. Lastly, hyperparameters were tuned using the caret package and across all models, a learning rate of 0.05 and tree complexity of 3 resulted in the highest accuracy. Lastly, the ‘pred_var’ predictor is a random set of numbers that will be used to identify which predictor variables should be included in the final model, and which are not informative.
The hypotheses I would like to test with these models are as follows:
H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.
study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?
H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.
study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?
H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.
study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?
H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.
study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?
H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.
study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?
Base models
These three models represent three different options for the base model and either include spatial predictors, a tag ID predictor, both, or neither. These models were developed by splitting the data set into 75/25 train/test, and thus that is the model evaluation approach used here. However, once a model is selected, I can run additional evaluation metrics (i.e., LOO, k-fold). I can also complete these now depending on when that is typically performed.
base_Nspat_Ntag <- readRDS (brt_outputs[7 ])
# model performance
ggBRT:: ggPerformance (base_Nspat_Ntag)
Model 1
Total.Deviance 1.3862741
Residual.Deviance 0.2861955
Correlation 0.9282161
AUC 0.9929000
Per.Expl 79.3550598
cvDeviance 0.5921329
cvCorrelation 0.8021263
cvAUC 0.9463200
cvPer.Expl 57.2860162
#relative influence of predictors
ggBRT:: ggInfluence (base_Nspat_Ntag)
rel.inf
bathy_mean 37.846057
temp_mean 24.062148
sal_mean 7.155808
chl_mean 6.204518
ssh_mean 5.944164
uostr_mean 4.190075
vostr_mean 3.770208
bathy_sd 2.868287
mld_mean 2.643751
uo_mean 2.213286
vo_mean 1.880648
pred_var 1.221051
#explore partial plots
gbm.plot (base_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
base_int <- gbm.interactions (base_Nspat_Ntag)
base_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 2 temp_mean 854.94
2 10 bathy_mean 3 sal_mean 698.33
3 10 bathy_mean 8 ssh_mean 563.67
4 8 ssh_mean 2 temp_mean 551.11
5 10 bathy_mean 4 uo_mean 540.01
6 3 sal_mean 2 temp_mean 399.47
7 2 temp_mean 1 chl_mean 346.97
#predictive performance using test dataset
preds <- predict.gbm (base_Nspat_Ntag, dat_test_base,
n.trees = base_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_base$ PA, preds) #get % deviance
dat_pred_base <- cbind (dat_test_base$ PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1 ] == 1 , 2 ]
abs_base <- dat_pred_base[dat_pred_base[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_base, a = abs_base)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (base_Nspat_Ntag)
#eval 75/25
eval_7525_modified (base_Nspat_Ntag,
testInput = dat_test_base,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4450 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2282071 0.8916574 0.9800992 0.9987515 0.7293395 0.7935506
base_Nspat_Ytag <- readRDS (brt_outputs[8 ])
# model performance
ggBRT:: ggPerformance (base_Nspat_Ytag)
Model 1
Total.Deviance 1.3862741
Residual.Deviance 0.1110165
Correlation 0.9807084
AUC 0.9998000
Per.Expl 91.9917378
cvDeviance 0.3495621
cvCorrelation 0.8963288
cvAUC 0.9783500
cvPer.Expl 74.7840586
#relative influence of predictors
ggBRT:: ggInfluence (base_Nspat_Ytag)
rel.inf
bathy_mean 32.1041550
tag 24.5344527
temp_mean 18.3478756
uostr_mean 5.0132259
ssh_mean 4.4470038
sal_mean 4.3156858
chl_mean 3.5816288
vostr_mean 2.8189697
bathy_sd 1.2908042
vo_mean 1.2581968
uo_mean 0.9661739
mld_mean 0.8414705
pred_var 0.4803574
#explore partial plots
gbm.plot (base_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
base_int <- gbm.interactions (base_Nspat_Ytag)
base_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 4 sal_mean 1 tag 1956.77
2 11 bathy_mean 1 tag 827.44
3 3 temp_mean 1 tag 701.21
4 2 chl_mean 1 tag 665.02
5 9 ssh_mean 1 tag 579.04
6 7 vo_mean 1 tag 394.71
7 11 bathy_mean 3 temp_mean 305.91
8 6 uostr_mean 1 tag 288.92
#predictive performance using test dataset
preds <- predict.gbm (base_Nspat_Ytag, dat_test_base,
n.trees = base_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_base$ PA, preds) #get % deviance
dat_pred_base <- cbind (dat_test_base$ PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1 ] == 1 , 2 ]
abs_base <- dat_pred_base[dat_pred_base[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_base, a = abs_base)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (base_Nspat_Ytag)
#eval 75/25
eval_7525_modified (base_Nspat_Ytag,
testInput = dat_test_base,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1469908 0.956733 0.9939051 0.9935086 0.8688192 0.9199174
base_Yspat_Ytag <- readRDS (brt_outputs[9 ])
# model performance
ggBRT:: ggPerformance (base_Yspat_Ytag)
Model 1
Total.Deviance 1.38627408
Residual.Deviance 0.09036597
Correlation 0.98482457
AUC 0.99990000
Per.Expl 93.48137767
cvDeviance 0.29733241
cvCorrelation 0.91408473
cvAUC 0.98300000
cvPer.Expl 78.55168620
#relative influence of predictors
ggBRT:: ggInfluence (base_Yspat_Ytag)
rel.inf
dist_coast 52.3832103
tag 20.2478553
lat 9.1003879
temp_mean 4.2416317
bathy_mean 3.5147582
chl_mean 2.7750598
sal_mean 2.0626823
ssh_mean 1.2159874
vostr_mean 1.1867095
uo_mean 0.6788234
vo_mean 0.6673415
uostr_mean 0.5772978
bathy_sd 0.5066506
mld_mean 0.5003317
pred_var 0.3412726
#explore partial plots
gbm.plot (base_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
base_int <- gbm.interactions (base_Yspat_Ytag)
base_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 651.04
2 3 chl_mean 1 tag 595.00
3 12 bathy_mean 1 tag 544.74
4 5 sal_mean 1 tag 499.41
5 14 dist_coast 1 tag 381.15
6 9 vostr_mean 1 tag 319.69
7 8 vo_mean 1 tag 285.70
8 4 temp_mean 1 tag 249.85
9 10 ssh_mean 1 tag 193.53
10 13 bathy_sd 1 tag 163.65
11 6 uo_mean 1 tag 123.96
#predictive performance using test dataset
preds <- predict.gbm (base_Yspat_Ytag, dat_test_base,
n.trees = base_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_base$ PA, preds) #get % deviance
dat_pred_base <- cbind (dat_test_base$ PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1 ] == 1 , 2 ]
abs_base <- dat_pred_base[dat_pred_base[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_base, a = abs_base)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (base_Yspat_Ytag)
#eval 75/25
eval_7525_modified (base_Yspat_Ytag,
testInput = dat_test_base,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5300 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1339725 0.9641455 0.9954561 0.9899414 0.889162 0.9348138
DO models
I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include DO and the other environmental predictor variables as longer time scales (seasonal/annual).
0m, no spatial, yes tag 0m, yes spatial, yes tag 0m & 60m, no spatial, yes tag 0m & 250m, no spatial, yes tag 0m, 60m, & 250m, no spatial, yes tag 0m, 60m, & 250m, yes spatial, yes tag
do_0m_Nspat_Ytag <- readRDS (brt_outputs[14 ])
# model performance
ggBRT:: ggPerformance (do_0m_Nspat_Ytag)
Model 1
Total.Deviance 1.38629302
Residual.Deviance 0.08569242
Correlation 0.98673362
AUC 1.00000000
Per.Expl 93.81859271
cvDeviance 0.30444849
cvCorrelation 0.91329019
cvAUC 0.98270000
cvPer.Expl 78.03866260
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_Nspat_Ytag)
rel.inf
bathy_mean 31.3188924
o2_mean_0m 25.6922130
tag 20.8632526
temp_mean 4.9825010
ssh_mean 3.9449855
chl_mean 3.6738368
vostr_mean 2.1762902
uostr_mean 1.9424041
sal_mean 1.8962177
bathy_sd 0.9210810
mld_mean 0.8088337
uo_mean 0.7033574
vo_mean 0.6742398
pred_var 0.4018948
#explore partial plots
gbm.plot (do_0m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_Nspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 5 sal_mean 1 tag 1317.79
2 12 bathy_mean 1 tag 898.27
3 2 o2_mean_0m 1 tag 753.94
4 4 temp_mean 2 o2_mean_0m 731.95
5 13 bathy_sd 1 tag 461.04
6 4 temp_mean 1 tag 452.00
7 3 chl_mean 1 tag 350.97
8 10 ssh_mean 1 tag 322.16
9 9 vostr_mean 1 tag 308.68
10 8 vo_mean 1 tag 280.17
#predictive performance using test dataset
preds <- predict.gbm (do_0m_Nspat_Ytag, dat_test_do,
n.trees = do_0m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_Nspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1200159 0.9715853 0.9970746 0.998006 0.9054067 0.9381859
do_0m_Yspat_Ytag <- readRDS (brt_outputs[15 ])
# model performance
ggBRT:: ggPerformance (do_0m_Yspat_Ytag)
Model 1
Total.Deviance 1.38629302
Residual.Deviance 0.07111301
Correlation 0.98956599
AUC 1.00000000
Per.Expl 94.87027583
cvDeviance 0.26513270
cvCorrelation 0.92577703
cvAUC 0.98583000
cvPer.Expl 80.87470024
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_Yspat_Ytag)
rel.inf
dist_coast 52.2693314
tag 18.2789790
o2_mean_0m 9.9461704
lat 7.1842134
bathy_mean 3.0959517
chl_mean 1.9510753
sal_mean 1.4925425
temp_mean 1.2416253
vostr_mean 0.9383412
ssh_mean 0.8462597
vo_mean 0.5322002
bathy_sd 0.5206533
uo_mean 0.5128695
mld_mean 0.4554318
uostr_mean 0.4344147
pred_var 0.2999407
#explore partial plots
gbm.plot (do_0m_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_Yspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 3 o2_mean_0m 1 tag 681.29
2 2 lat 1 tag 614.15
3 15 dist_coast 1 tag 509.20
4 13 bathy_mean 1 tag 470.74
5 5 temp_mean 3 o2_mean_0m 461.63
6 4 chl_mean 1 tag 409.48
7 6 sal_mean 1 tag 393.97
8 14 bathy_sd 1 tag 350.26
9 10 vostr_mean 1 tag 252.37
10 9 vo_mean 1 tag 250.86
11 15 dist_coast 3 o2_mean_0m 214.24
12 11 ssh_mean 1 tag 202.54
13 5 temp_mean 1 tag 189.74
#predictive performance using test dataset
preds <- predict.gbm (do_0m_Yspat_Ytag, dat_test_do,
n.trees = do_0m_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_Yspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_Yspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5500 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1124583 0.9749439 0.9977097 0.9977912 0.9178943 0.9487028
do_0m_60m_Nspat_Ytag <- readRDS (brt_outputs[13 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_Nspat_Ytag)
Model 1
Total.Deviance 1.38629302
Residual.Deviance 0.06921534
Correlation 0.99039620
AUC 1.00000000
Per.Expl 95.00716346
cvDeviance 0.28881667
cvCorrelation 0.91957663
cvAUC 0.98372000
cvPer.Expl 79.16626123
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_Nspat_Ytag)
rel.inf
bathy_mean 28.6020233
o2_mean_0m 25.7568149
tag 19.2681616
o2_mean_60m 10.5230432
ssh_mean 3.9366672
chl_mean 3.0096521
temp_mean 1.8833026
sal_mean 1.7203767
uostr_mean 1.2905618
vostr_mean 1.2578356
mld_mean 0.6504126
bathy_sd 0.6501386
vo_mean 0.6103547
uo_mean 0.5075413
pred_var 0.3331139
#explore partial plots
gbm.plot (do_0m_60m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_Nspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 1 tag 827.96
2 5 sal_mean 1 tag 765.02
3 2 o2_mean_0m 1 tag 726.73
4 14 o2_mean_60m 1 tag 465.01
5 4 temp_mean 2 o2_mean_0m 448.02
6 3 chl_mean 1 tag 401.11
7 4 temp_mean 1 tag 368.01
8 9 vostr_mean 1 tag 330.29
9 13 bathy_sd 1 tag 285.54
10 10 ssh_mean 1 tag 258.37
11 8 vo_mean 1 tag 217.71
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_Nspat_Ytag, dat_test_do,
n.trees = do_0m_60m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_60m_Nspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6150 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1131281 0.9746304 0.9972402 0.9972539 0.9148331 0.9500716
do_0m_250m_Nspat_Ytag <- readRDS (brt_outputs[10 ])
# model performance
ggBRT:: ggPerformance (do_0m_250m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862930
Residual.Deviance 0.0772215
Correlation 0.9880214
AUC 1.0000000
Per.Expl 94.4296408
cvDeviance 0.2923215
cvCorrelation 0.9180102
cvAUC 0.9832900
cvPer.Expl 78.9134402
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_250m_Nspat_Ytag)
rel.inf
o2_mean_0m 26.7124835
bathy_mean 24.5325773
tag 19.6354077
o2_mean_250m 16.3256290
ssh_mean 2.3877300
chl_mean 2.0737228
temp_mean 1.9080636
sal_mean 1.5922732
vostr_mean 1.0597981
bathy_sd 0.9220362
uostr_mean 0.8579829
vo_mean 0.6550567
uo_mean 0.5147781
mld_mean 0.5111500
pred_var 0.3113108
#explore partial plots
gbm.plot (do_0m_250m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_250m_Nspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 5 sal_mean 1 tag 1083.83
2 2 o2_mean_0m 1 tag 810.86
3 4 temp_mean 2 o2_mean_0m 611.00
4 14 o2_mean_250m 1 tag 533.46
5 12 bathy_mean 1 tag 465.44
6 4 temp_mean 1 tag 407.57
7 3 chl_mean 1 tag 361.70
8 13 bathy_sd 1 tag 323.33
9 10 ssh_mean 1 tag 307.93
10 9 vostr_mean 1 tag 257.21
11 14 o2_mean_250m 2 o2_mean_0m 241.29
#predictive performance using test dataset
preds <- predict.gbm (do_0m_250m_Nspat_Ytag, dat_test_do,
n.trees = do_0m_250m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_250m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_250m_Nspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1166426 0.9730488 0.9972335 0.9974503 0.9110834 0.9442964
do_0m_60m_250m_Nspat_Ytag <- readRDS (brt_outputs[11 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_250m_Nspat_Ytag)
Model 1
Total.Deviance 1.38629302
Residual.Deviance 0.06405767
Correlation 0.99133355
AUC 1.00000000
Per.Expl 95.37921112
cvDeviance 0.27651361
cvCorrelation 0.92323053
cvAUC 0.98486000
cvPer.Expl 80.05373977
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_250m_Nspat_Ytag)
rel.inf
o2_mean_0m 26.3744059
bathy_mean 24.2567146
tag 18.4164082
o2_mean_250m 12.9309717
o2_mean_60m 5.8644578
ssh_mean 3.4303989
chl_mean 1.8947511
temp_mean 1.3576031
sal_mean 1.2825109
uostr_mean 0.8488525
vostr_mean 0.8461226
bathy_sd 0.6934417
vo_mean 0.5279284
mld_mean 0.5255140
uo_mean 0.4617161
pred_var 0.2882026
#explore partial plots
gbm.plot (do_0m_60m_250m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_250m_Nspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 o2_mean_0m 1 tag 756.09
2 5 sal_mean 1 tag 744.60
3 4 temp_mean 2 o2_mean_0m 681.68
4 12 bathy_mean 1 tag 457.99
5 15 o2_mean_250m 1 tag 423.21
6 14 o2_mean_60m 1 tag 351.76
7 3 chl_mean 1 tag 331.31
8 4 temp_mean 1 tag 322.01
9 13 bathy_sd 1 tag 241.45
10 9 vostr_mean 1 tag 237.92
11 8 vo_mean 1 tag 230.84
12 10 ssh_mean 1 tag 227.37
13 7 uostr_mean 1 tag 207.53
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_250m_Nspat_Ytag, dat_test_do,
n.trees = do_0m_60m_250m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_250m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_60m_250m_Nspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6000 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1111881 0.9754482 0.9973543 0.9967165 0.9183581 0.9537921
do_0m_60m_250m_Yspat_Ytag <- readRDS (brt_outputs[12 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_250m_Yspat_Ytag)
Model 1
Total.Deviance 1.38629302
Residual.Deviance 0.06107201
Correlation 0.99179288
AUC 1.00000000
Per.Expl 95.59458127
cvDeviance 0.26289954
cvCorrelation 0.92771979
cvAUC 0.98575000
cvPer.Expl 81.03578841
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_250m_Yspat_Ytag)
rel.inf
dist_coast 50.7776327
tag 17.4708084
o2_mean_0m 10.2952610
o2_mean_250m 4.9246556
lat 4.4525242
o2_mean_60m 2.8631545
chl_mean 1.8981111
bathy_mean 1.7092848
sal_mean 1.0541439
temp_mean 0.9245410
vostr_mean 0.6430521
ssh_mean 0.6087970
vo_mean 0.4919662
uostr_mean 0.4554889
bathy_sd 0.4404908
mld_mean 0.4397390
uo_mean 0.3089437
pred_var 0.2414052
#explore partial plots
gbm.plot (do_0m_60m_250m_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_250m_Yspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 3 o2_mean_0m 1 tag 639.99
2 6 sal_mean 1 tag 381.15
3 14 bathy_sd 1 tag 369.34
4 15 dist_coast 1 tag 365.12
5 2 lat 1 tag 346.60
6 17 o2_mean_250m 1 tag 306.59
7 4 chl_mean 1 tag 290.18
8 13 bathy_mean 1 tag 285.08
9 16 o2_mean_60m 5 temp_mean 253.71
10 16 o2_mean_60m 1 tag 244.79
11 5 temp_mean 3 o2_mean_0m 222.92
12 10 vostr_mean 1 tag 215.04
13 9 vo_mean 1 tag 167.52
14 8 uostr_mean 1 tag 116.74
15 15 dist_coast 3 o2_mean_0m 109.89
16 11 ssh_mean 1 tag 109.82
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_250m_Yspat_Ytag, dat_test_do,
n.trees = do_0m_60m_250m_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_250m_Yspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_60m_250m_Yspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5700 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1067755 0.9773981 0.9982434 0.9959618 0.9259605 0.9559458
AGI models
I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include AGI and the other environmental predictor variables as longer time scales (seasonal/annual).
0m, no spatial, yes tag 0m, yes spatial, yes tag 0m & 60m, no spatial, yes tag 0m & 250m, no spatial, yes tag 0m, 60m, & 250m, no spatial, yes tag 0m, 60m, & 250m, yes spatial, yes tag
agi_0m_Nspat_Ytag <- readRDS (brt_outputs[5 ])
# model performance
ggBRT:: ggPerformance (agi_0m_Nspat_Ytag)
Model 1
Total.Deviance 1.38628919
Residual.Deviance 0.09058218
Correlation 0.98574976
AUC 1.00000000
Per.Expl 93.46585281
cvDeviance 0.31679477
cvCorrelation 0.91027275
cvAUC 0.98076000
cvPer.Expl 77.14800254
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_Nspat_Ytag)
rel.inf
bathy_mean 31.7564227
tag 22.4317739
temp_mean 19.1472195
uostr_mean 4.6441372
ssh_mean 4.6035826
AGI_0m 4.3344168
sal_mean 3.7523624
chl_mean 2.9185772
vostr_mean 2.4429168
bathy_sd 1.2895177
vo_mean 0.8922124
mld_mean 0.7121506
uo_mean 0.6928352
pred_var 0.3818752
#explore partial plots
gbm.plot (agi_0m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_Nspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1670.77
2 4 sal_mean 1 tag 1358.50
3 3 temp_mean 1 tag 796.37
4 11 bathy_mean 1 tag 716.67
5 12 bathy_sd 1 tag 500.56
6 8 vostr_mean 1 tag 360.82
7 2 chl_mean 1 tag 350.27
8 11 bathy_mean 3 temp_mean 343.69
9 9 ssh_mean 1 tag 293.85
10 7 vo_mean 1 tag 283.73
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_Nspat_Ytag, dat_test_agi,
n.trees = agi_0m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_Nspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5900 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1269187 0.9680701 0.9963482 0.9991957 0.8970815 0.9346585
agi_0m_Yspat_Ytag <- readRDS (brt_outputs[6 ])
# model performance
ggBRT:: ggPerformance (agi_0m_Yspat_Ytag)
Model 1
Total.Deviance 1.38628919
Residual.Deviance 0.07536563
Correlation 0.98861806
AUC 1.00000000
Per.Expl 94.56349886
cvDeviance 0.28457545
cvCorrelation 0.92193385
cvAUC 0.98308000
cvPer.Expl 79.47214380
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_Yspat_Ytag)
rel.inf
dist_coast 52.6885132
tag 19.0100569
lat 9.3755080
temp_mean 3.8940828
bathy_mean 3.1187408
AGI_0m 2.5281708
chl_mean 2.1679033
sal_mean 2.0794485
ssh_mean 1.1326443
vostr_mean 0.8369160
uostr_mean 0.6860411
bathy_sd 0.6046589
vo_mean 0.5672036
uo_mean 0.5434052
mld_mean 0.4635287
pred_var 0.3031779
#explore partial plots
gbm.plot (agi_0m_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_Yspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 618.71
2 5 sal_mean 1 tag 458.94
3 3 chl_mean 1 tag 457.19
4 12 bathy_mean 1 tag 444.88
5 15 dist_coast 1 tag 353.77
6 4 temp_mean 1 tag 314.80
7 13 bathy_sd 1 tag 307.47
8 8 vo_mean 1 tag 282.40
9 9 vostr_mean 1 tag 277.46
10 14 AGI_0m 4 temp_mean 224.72
11 14 AGI_0m 1 tag 223.58
12 10 ssh_mean 1 tag 155.83
13 7 uostr_mean 1 tag 125.51
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_Yspat_Ytag, dat_test_agi,
n.trees = agi_0m_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_Yspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_Yspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5550 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1172909 0.9726691 0.996907 0.9962742 0.9108565 0.945635
agi_0m_60m_Nspat_Ytag <- readRDS (brt_outputs[4 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_Nspat_Ytag)
Model 1
Total.Deviance 1.38628919
Residual.Deviance 0.08163925
Correlation 0.98784986
AUC 1.00000000
Per.Expl 94.11095093
cvDeviance 0.30657340
cvCorrelation 0.91435083
cvAUC 0.98164000
cvPer.Expl 77.88532168
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_Nspat_Ytag)
rel.inf
bathy_mean 31.0165218
tag 22.7089581
temp_mean 19.3123505
uostr_mean 4.0921717
AGI_0m 4.0228983
AGI_60m 3.9028901
sal_mean 3.7019688
ssh_mean 3.0607407
vostr_mean 2.4285168
chl_mean 2.2359268
bathy_sd 1.0013958
vo_mean 0.8073497
uo_mean 0.7416074
mld_mean 0.6546704
pred_var 0.3120329
#explore partial plots
gbm.plot (agi_0m_60m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_Nspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1546.61
2 4 sal_mean 1 tag 1160.06
3 3 temp_mean 1 tag 718.56
4 11 bathy_mean 1 tag 678.18
5 14 AGI_60m 1 tag 562.58
6 11 bathy_mean 3 temp_mean 375.86
7 2 chl_mean 1 tag 371.78
8 8 vostr_mean 1 tag 332.14
9 12 bathy_sd 1 tag 314.26
10 7 vo_mean 1 tag 270.59
11 9 ssh_mean 1 tag 248.80
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_Nspat_Ytag, dat_test_agi,
n.trees = agi_0m_60m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_60m_Nspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5950 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1207813 0.9711081 0.9967905 0.99765 0.905258 0.9411095
agi_0m_250m_Nspat_Ytag <- readRDS (brt_outputs[1 ])
# model performance
ggBRT:: ggPerformance (agi_0m_250m_Nspat_Ytag)
Model 1
Total.Deviance 1.38628919
Residual.Deviance 0.08329738
Correlation 0.98684251
AUC 1.00000000
Per.Expl 93.99134165
cvDeviance 0.29978280
cvCorrelation 0.91514338
cvAUC 0.98237000
cvPer.Expl 78.37516119
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_250m_Nspat_Ytag)
rel.inf
bathy_mean 25.7556370
tag 21.5992835
temp_mean 17.8343206
AGI_250m 12.5454181
uostr_mean 4.7132797
ssh_mean 4.0214353
AGI_0m 3.6118354
sal_mean 3.1006797
chl_mean 1.9387158
vostr_mean 1.4830631
bathy_sd 1.1646793
vo_mean 0.7268532
mld_mean 0.5851221
uo_mean 0.5700321
pred_var 0.3496450
#explore partial plots
gbm.plot (agi_0m_250m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_250m_Nspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 4 sal_mean 1 tag 1107.60
2 13 AGI_0m 3 temp_mean 959.81
3 3 temp_mean 1 tag 653.96
4 14 AGI_250m 1 tag 483.70
5 2 chl_mean 1 tag 436.22
6 12 bathy_sd 1 tag 411.43
7 11 bathy_mean 1 tag 378.45
8 7 vo_mean 1 tag 342.67
9 8 vostr_mean 1 tag 323.33
10 9 ssh_mean 1 tag 260.62
11 6 uostr_mean 1 tag 213.62
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_250m_Nspat_Ytag, dat_test_agi,
n.trees = agi_0m_250m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_250m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_250m_Nspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.123622 0.9696011 0.9959497 0.9972633 0.9008517 0.9399134
agi_0m_60m_250m_Nspat_Ytag <- readRDS (brt_outputs[2 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_250m_Nspat_Ytag)
Model 1
Total.Deviance 1.38628919
Residual.Deviance 0.07572957
Correlation 0.98877144
AUC 1.00000000
Per.Expl 94.53724618
cvDeviance 0.29301289
cvCorrelation 0.91783184
cvAUC 0.98281000
cvPer.Expl 78.86350878
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_250m_Nspat_Ytag)
rel.inf
bathy_mean 26.8363780
tag 21.0957906
temp_mean 18.0262218
AGI_250m 12.1947644
uostr_mean 4.4840074
AGI_0m 3.8063516
ssh_mean 3.0616953
sal_mean 2.7962168
chl_mean 1.7130823
AGI_60m 1.4809786
vostr_mean 1.2200984
bathy_sd 1.0170479
vo_mean 0.8630534
uo_mean 0.5773383
mld_mean 0.5453203
pred_var 0.2816549
#explore partial plots
gbm.plot (agi_0m_60m_250m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_250m_Nspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1152.68
2 4 sal_mean 1 tag 1005.00
3 3 temp_mean 1 tag 635.71
4 15 AGI_250m 1 tag 397.25
5 2 chl_mean 1 tag 396.66
6 14 AGI_60m 1 tag 391.13
7 7 vo_mean 1 tag 371.09
8 11 bathy_mean 1 tag 357.41
9 12 bathy_sd 1 tag 320.70
10 8 vostr_mean 1 tag 299.57
11 9 ssh_mean 1 tag 232.02
12 15 AGI_250m 3 temp_mean 161.28
13 6 uostr_mean 1 tag 142.65
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_250m_Nspat_Ytag, dat_test_agi,
n.trees = agi_0m_60m_250m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_250m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_60m_250m_Nspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5700 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.11824 0.9722116 0.9961794 0.9980251 0.9071832 0.9453725
agi_0m_60m_250m_Yspat_Ytag <- readRDS (brt_outputs[3 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_250m_Yspat_Ytag)
Model 1
Total.Deviance 1.38628919
Residual.Deviance 0.07231845
Correlation 0.98904888
AUC 1.00000000
Per.Expl 94.78330693
cvDeviance 0.27235546
cvCorrelation 0.92438633
cvAUC 0.98474000
cvPer.Expl 80.35363316
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_250m_Yspat_Ytag)
rel.inf
dist_coast 52.2028237
tag 19.8286802
lat 8.5259754
temp_mean 3.6800178
AGI_250m 3.2802055
AGI_0m 2.2035478
bathy_mean 1.8557018
chl_mean 1.7402778
sal_mean 1.4654785
AGI_60m 1.1294626
ssh_mean 0.7267973
vostr_mean 0.7162280
uostr_mean 0.5488083
bathy_sd 0.5294415
vo_mean 0.5050275
mld_mean 0.4284863
uo_mean 0.3811785
pred_var 0.2518614
#explore partial plots
gbm.plot (agi_0m_60m_250m_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_250m_Yspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 455.96
2 16 AGI_60m 1 tag 450.35
3 3 chl_mean 1 tag 425.15
4 14 AGI_0m 4 temp_mean 409.72
5 13 bathy_sd 1 tag 384.02
6 5 sal_mean 1 tag 331.25
7 12 bathy_mean 1 tag 330.47
8 17 AGI_250m 1 tag 285.00
9 9 vostr_mean 1 tag 237.90
10 4 temp_mean 1 tag 237.18
11 8 vo_mean 1 tag 234.93
12 14 AGI_0m 1 tag 169.22
13 15 dist_coast 1 tag 157.56
14 7 uostr_mean 1 tag 127.32
15 16 AGI_60m 3 chl_mean 99.12
16 14 AGI_0m 2 lat 88.90
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_250m_Yspat_Ytag, dat_test_agi,
n.trees = agi_0m_60m_250m_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_250m_Yspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_60m_250m_Yspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5200 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.115729 0.9733434 0.9965503 0.9956153 0.9118642 0.9478331
Summary table of results
output_sum <- read.csv (here ("data/brt/mod_outputs/brt_bckg_output_summary.csv" ))
kableExtra:: kable (output_sum)
base_0m_Nspat_Ntag
79.360
0.375
0.740
0.872
0.980
0.228
0.891
0.980
0.998
0.794
base_0m_Nspat_Ytag
91.990
0.182
0.759
0.957
0.994
0.147
0.957
0.994
0.994
0.920
base_0m_Yspat_Ytag
93.500
0.154
0.771
0.965
0.995
0.134
0.964
0.995
0.990
0.935
do_0m_Nspat_Ytag
93.820
0.131
0.769
0.976
0.997
0.120
0.972
0.997
0.998
0.938
do_0m_Yspat_Ytag
94.870
0.114
0.781
0.977
0.998
0.112
0.975
0.998
0.998
0.949
do_0m_60m_Nspat_Ytag
95.007
0.118
0.772
0.978
0.997
0.113
0.975
0.997
0.997
0.950
do_0m_250m_Nspat_Ytag
94.430
0.123
0.777
0.976
0.997
0.117
0.973
0.997
0.997
0.944
do_0m_60m_250m_Nspat_Ytag
95.380
0.113
0.778
0.979
0.997
0.111
0.975
0.997
0.997
0.954
do_0m_60m_250m_Yspat_Ytag
95.590
0.103
0.783
0.978
0.998
0.107
0.977
0.998
0.996
0.956
agi_0m_Nspat_Ytag
93.470
0.143
0.764
0.970
0.996
0.127
0.968
0.996
0.999
0.934
agi_0m_Yspat_Ytag
94.560
0.124
0.776
0.976
0.997
0.117
0.973
0.997
0.996
0.946
agi_0m_60m_Nspat_Ytag
94.111
0.131
0.764
0.971
0.997
0.121
0.971
0.997
0.998
0.941
agi_0m_250m_Nspat_Ytag
93.990
0.137
0.769
0.970
0.996
0.124
0.970
0.996
0.997
0.940
agi_0m_60m_250m_Nspat_Ytag
94.540
0.129
0.768
0.973
0.996
0.118
0.972
0.996
0.998
0.945
agi_0m_60m_250m_Yspat_Ytag
94.780
0.122
0.773
0.974
0.997
0.116
0.973
0.997
0.996
0.948
ggplot (output_sum, aes (AUC, TSS, color = percent_explained, label = model)) +
geom_point (size = 3 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' )
DO models w/o tag ID
Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that dissolved oxygen may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.
0m, no spatial, no tag 0m, yes spatial, no tag 0m & 60m, no spatial, no tag 0m & 250m, no spatial, no tag 0m, 60m, & 250m, no spatial, no tag 0m, 60m, & 250m, yes spatial, no tag
do_0m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[12 ])
# model performance
ggBRT:: ggPerformance (do_0m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862934
Residual.Deviance 0.2188393
Correlation 0.9490943
AUC 0.9966000
Per.Expl 84.2140724
cvDeviance 0.5170811
cvCorrelation 0.8339445
cvAUC 0.9588700
cvPer.Expl 62.7004570
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_Nspat_Ntag)
rel.inf
bathy_mean 36.618282
o2_mean_0m 29.909294
temp_mean 7.873891
chl_mean 5.305086
ssh_mean 4.223122
sal_mean 3.004185
vostr_mean 2.838490
mld_mean 2.084183
bathy_sd 1.974745
uostr_mean 1.913768
uo_mean 1.739428
vo_mean 1.407480
pred_var 1.108047
#explore partial plots
gbm.plot (do_0m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_Nspat_Ntag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 689.53
2 11 bathy_mean 4 sal_mean 592.77
3 11 bathy_mean 9 ssh_mean 576.98
4 11 bathy_mean 5 uo_mean 442.14
5 9 ssh_mean 3 temp_mean 385.11
6 11 bathy_mean 3 temp_mean 378.70
7 11 bathy_mean 1 o2_mean_0m 266.27
8 12 bathy_sd 1 o2_mean_0m 255.59
#predictive performance using test dataset
preds <- predict.gbm (do_0m_Nspat_Ntag, dat_test_do,
n.trees = do_0m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (do_0m_Nspat_Ntag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2078864 0.9108071 0.9855748 0.9981125 0.7717969 0.8421407
do_0m_Yspat_Ntag <- readRDS (brt_outputs_Ntag[13 ])
# model performance
ggBRT:: ggPerformance (do_0m_Yspat_Ntag)
Model 1
Total.Deviance 1.3862934
Residual.Deviance 0.1949867
Correlation 0.9556768
AUC 0.9975000
Per.Expl 85.9346757
cvDeviance 0.4783362
cvCorrelation 0.8485144
cvAUC 0.9643500
cvPer.Expl 65.4953096
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_Yspat_Ntag)
rel.inf
dist_coast 53.5696625
o2_mean_0m 12.6309567
lat 8.1195782
bathy_mean 5.6664282
chl_mean 3.8500282
temp_mean 2.9692000
sal_mean 2.4553503
ssh_mean 2.1991953
vostr_mean 1.6248422
mld_mean 1.5073425
uo_mean 1.3556893
bathy_sd 1.2144774
uostr_mean 1.0434857
vo_mean 0.9699671
pred_var 0.8237967
#explore partial plots
gbm.plot (do_0m_Yspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_Yspat_Ntag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 4 temp_mean 969.31
2 4 temp_mean 2 o2_mean_0m 582.13
3 12 bathy_mean 1 lat 295.21
4 12 bathy_mean 10 ssh_mean 220.95
5 3 chl_mean 2 o2_mean_0m 208.59
6 4 temp_mean 1 lat 197.51
7 12 bathy_mean 6 uo_mean 180.50
8 13 bathy_sd 2 o2_mean_0m 168.48
9 3 chl_mean 1 lat 143.75
10 14 dist_coast 5 sal_mean 142.09
11 12 bathy_mean 5 sal_mean 139.44
#predictive performance using test dataset
preds <- predict.gbm (do_0m_Yspat_Ntag, dat_test_do,
n.trees = do_0m_Yspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_Yspat_Ntag)
#eval 75/25
eval_7525_modified (do_0m_Yspat_Ntag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1961425 0.9209367 0.9877846 0.9968631 0.7936337 0.8593468
do_0m_60m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[11 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862934
Residual.Deviance 0.2152171
Correlation 0.9494120
AUC 0.9963000
Per.Expl 84.4753535
cvDeviance 0.4980739
cvCorrelation 0.8417192
cvAUC 0.9613600
cvPer.Expl 64.0715342
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_Nspat_Ntag)
rel.inf
bathy_mean 33.3337448
o2_mean_0m 29.2907706
o2_mean_60m 10.7154473
chl_mean 4.7874063
ssh_mean 4.4654220
temp_mean 4.2047433
sal_mean 2.8985582
vostr_mean 1.9514731
mld_mean 1.8318151
uo_mean 1.6194828
bathy_sd 1.5388993
uostr_mean 1.4127257
vo_mean 0.9997546
pred_var 0.9497570
#explore partial plots
gbm.plot (do_0m_60m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_Nspat_Ntag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 587.36
2 11 bathy_mean 9 ssh_mean 529.13
3 13 o2_mean_60m 3 temp_mean 467.19
4 11 bathy_mean 5 uo_mean 395.13
5 11 bathy_mean 3 temp_mean 317.18
6 11 bathy_mean 4 sal_mean 312.00
7 6 uostr_mean 1 o2_mean_0m 296.24
8 12 bathy_sd 1 o2_mean_0m 223.69
9 13 o2_mean_60m 2 chl_mean 166.11
10 11 bathy_mean 1 o2_mean_0m 151.42
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_Nspat_Ntag, dat_test_do,
n.trees = do_0m_60m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (do_0m_60m_Nspat_Ntag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4350 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2039139 0.9142042 0.9862436 0.9962925 0.7796691 0.8447535
do_0m_250m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[8 ])
# model performance
ggBRT:: ggPerformance (do_0m_250m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862934
Residual.Deviance 0.2335030
Correlation 0.9429315
AUC 0.9954000
Per.Expl 83.1563108
cvDeviance 0.5005420
cvCorrelation 0.8398630
cvAUC 0.9608000
cvPer.Expl 63.8935019
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_250m_Nspat_Ntag)
rel.inf
bathy_mean 30.0267521
o2_mean_0m 29.7310875
o2_mean_250m 17.3780027
temp_mean 3.9785508
chl_mean 3.7684071
sal_mean 2.9143743
ssh_mean 2.5775664
bathy_sd 1.7071494
vostr_mean 1.6822884
uo_mean 1.4379251
uostr_mean 1.4253686
mld_mean 1.3330125
vo_mean 1.1026202
pred_var 0.9368949
#explore partial plots
gbm.plot (do_0m_250m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_250m_Nspat_Ntag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 4 sal_mean 334.54
2 3 temp_mean 1 o2_mean_0m 330.27
3 11 bathy_mean 9 ssh_mean 277.86
4 13 o2_mean_250m 1 o2_mean_0m 276.54
5 6 uostr_mean 1 o2_mean_0m 255.46
6 11 bathy_mean 3 temp_mean 207.76
7 2 chl_mean 1 o2_mean_0m 190.07
8 12 bathy_sd 1 o2_mean_0m 186.35
9 13 o2_mean_250m 4 sal_mean 156.17
10 13 o2_mean_250m 3 temp_mean 151.82
#predictive performance using test dataset
preds <- predict.gbm (do_0m_250m_Nspat_Ntag, dat_test_do,
n.trees = do_0m_250m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_250m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (do_0m_250m_Nspat_Ntag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
3850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2116599 0.9071004 0.9846599 0.9967308 0.7667759 0.8315631
do_0m_60m_250m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[9 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_250m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862934
Residual.Deviance 0.1967857
Correlation 0.9555813
AUC 0.9975000
Per.Expl 85.8049005
cvDeviance 0.4879500
cvCorrelation 0.8446379
cvAUC 0.9630100
cvPer.Expl 64.8018226
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_250m_Nspat_Ntag)
rel.inf
bathy_mean 29.784726
o2_mean_0m 29.575237
o2_mean_250m 12.535092
o2_mean_60m 6.692749
chl_mean 3.607960
temp_mean 3.236009
sal_mean 2.956732
ssh_mean 2.508817
vostr_mean 1.606424
bathy_sd 1.536286
uo_mean 1.440651
uostr_mean 1.322227
mld_mean 1.317766
vo_mean 1.024736
pred_var 0.854590
#explore partial plots
gbm.plot (do_0m_60m_250m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_250m_Nspat_Ntag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 575.57
2 2 chl_mean 1 o2_mean_0m 367.59
3 11 bathy_mean 4 sal_mean 247.44
4 11 bathy_mean 5 uo_mean 244.71
5 13 o2_mean_60m 3 temp_mean 233.27
6 11 bathy_mean 9 ssh_mean 222.57
7 11 bathy_mean 3 temp_mean 204.95
8 14 o2_mean_250m 13 o2_mean_60m 192.80
9 14 o2_mean_250m 1 o2_mean_0m 184.47
10 13 o2_mean_60m 11 bathy_mean 168.43
11 12 bathy_sd 1 o2_mean_0m 167.06
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_250m_Nspat_Ntag, dat_test_do,
n.trees = do_0m_60m_250m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_250m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (do_0m_60m_250m_Nspat_Ntag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1981623 0.9191668 0.9872286 0.999063 0.7896551 0.858049
do_0m_60m_250m_Yspat_Ntag <- readRDS (brt_outputs_Ntag[10 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_250m_Yspat_Ntag)
Model 1
Total.Deviance 1.3862934
Residual.Deviance 0.1711730
Correlation 0.9637236
AUC 0.9984000
Per.Expl 87.6524677
cvDeviance 0.4688544
cvCorrelation 0.8531779
cvAUC 0.9656700
cvPer.Expl 66.1792818
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_250m_Yspat_Ntag)
rel.inf
dist_coast 51.2057041
o2_mean_0m 11.5940030
o2_mean_250m 7.6458020
lat 4.9904405
bathy_mean 3.8441712
o2_mean_60m 3.4353426
chl_mean 3.1827866
temp_mean 2.7383315
sal_mean 2.2099326
ssh_mean 1.9054953
mld_mean 1.3048380
vostr_mean 1.1565039
uo_mean 1.1379053
bathy_sd 1.1340787
uostr_mean 0.9609815
vo_mean 0.7913691
pred_var 0.7623142
#explore partial plots
gbm.plot (do_0m_60m_250m_Yspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_250m_Yspat_Ntag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 4 temp_mean 460.19
2 12 bathy_mean 6 uo_mean 369.18
3 4 temp_mean 2 o2_mean_0m 294.93
4 15 o2_mean_60m 4 temp_mean 221.21
5 3 chl_mean 2 o2_mean_0m 213.95
6 16 o2_mean_250m 1 lat 194.30
7 12 bathy_mean 1 lat 176.52
8 12 bathy_mean 5 sal_mean 174.94
9 15 o2_mean_60m 5 sal_mean 157.19
10 12 bathy_mean 3 chl_mean 136.43
11 3 chl_mean 1 lat 135.67
12 15 o2_mean_60m 12 bathy_mean 121.03
13 7 uostr_mean 2 o2_mean_0m 114.49
14 12 bathy_mean 10 ssh_mean 109.97
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_250m_Yspat_Ntag, dat_test_do,
n.trees = do_0m_60m_250m_Yspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_250m_Yspat_Ntag)
#eval 75/25
eval_7525_modified (do_0m_60m_250m_Yspat_Ntag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1850589 0.9301022 0.9896335 0.9972148 0.8116198 0.8765247
AGI models w/o tag ID
Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that AGI may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.
0m, no spatial, no tag 0m, yes spatial, no tag 0m & 60m, no spatial, no tag 0m & 250m, no spatial, no tag 0m, 60m, & 250m, no spatial, no tag 0m, 60m, & 250m, yes spatial, no tag
agi_0m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[5 ])
# model performance
ggBRT:: ggPerformance (agi_0m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862490
Residual.Deviance 0.2494636
Correlation 0.9389402
AUC 0.9946000
Per.Expl 82.0044167
cvDeviance 0.5227643
cvCorrelation 0.8310463
cvAUC 0.9575400
cvPer.Expl 62.2892911
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_Nspat_Ntag)
rel.inf
bathy_mean 36.491740
temp_mean 21.375886
AGI_0m 10.198606
sal_mean 5.377754
ssh_mean 5.311421
uostr_mean 5.009516
chl_mean 4.871749
vostr_mean 3.562961
bathy_sd 2.025273
mld_mean 1.700623
uo_mean 1.632024
vo_mean 1.472134
pred_var 0.970312
#explore partial plots
gbm.plot (agi_0m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_Nspat_Ntag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 6302.78
2 10 bathy_mean 2 temp_mean 752.63
3 10 bathy_mean 8 ssh_mean 463.53
4 12 AGI_0m 8 ssh_mean 447.89
5 5 uostr_mean 2 temp_mean 358.75
6 10 bathy_mean 3 sal_mean 358.05
7 10 bathy_mean 4 uo_mean 261.61
8 8 ssh_mean 4 uo_mean 253.56
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_Nspat_Ntag, dat_test_agi,
n.trees = agi_0m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (agi_0m_Nspat_Ntag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4100 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2116179 0.9077497 0.9853149 0.9994755 0.7644717 0.8200442
agi_0m_Yspat_Ntag <- readRDS (brt_outputs_Ntag[6 ])
# model performance
ggBRT:: ggPerformance (agi_0m_Yspat_Ntag)
Model 1
Total.Deviance 1.3862490
Residual.Deviance 0.2061652
Correlation 0.9520613
AUC 0.9968000
Per.Expl 85.1278375
cvDeviance 0.4844196
cvCorrelation 0.8457186
cvAUC 0.9631500
cvPer.Expl 65.0553696
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_Yspat_Ntag)
rel.inf
dist_coast 53.2176953
lat 10.6179286
AGI_0m 7.1940580
bathy_mean 6.9716928
temp_mean 5.2270863
chl_mean 3.4956185
sal_mean 2.7812823
ssh_mean 2.0438267
uo_mean 1.4980073
mld_mean 1.4534671
vostr_mean 1.4445966
uostr_mean 1.1261180
bathy_sd 1.0472647
vo_mean 1.0242153
pred_var 0.8571425
#explore partial plots
gbm.plot (agi_0m_Yspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_Yspat_Ntag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 2156.60
2 13 AGI_0m 1 lat 801.90
3 3 temp_mean 1 lat 588.20
4 11 bathy_mean 3 temp_mean 459.87
5 14 dist_coast 10 mld_mean 407.88
6 13 AGI_0m 11 bathy_mean 278.43
7 11 bathy_mean 5 uo_mean 268.20
8 11 bathy_mean 2 chl_mean 191.28
9 11 bathy_mean 1 lat 162.84
10 14 dist_coast 4 sal_mean 158.85
11 6 uostr_mean 1 lat 153.97
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_Yspat_Ntag, dat_test_agi,
n.trees = agi_0m_Yspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_Yspat_Ntag)
#eval 75/25
eval_7525_modified (agi_0m_Yspat_Ntag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4400 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.195868 0.921452 0.9884097 0.9994483 0.7941426 0.8512784
agi_0m_60m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[4 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862490
Residual.Deviance 0.2005887
Correlation 0.9556745
AUC 0.9975000
Per.Expl 85.5301104
cvDeviance 0.5068771
cvCorrelation 0.8393242
cvAUC 0.9601500
cvPer.Expl 63.4353465
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_Nspat_Ntag)
rel.inf
bathy_mean 33.7808593
temp_mean 20.4921335
AGI_0m 10.1395119
uostr_mean 6.1395169
AGI_60m 5.7829887
sal_mean 4.5642329
chl_mean 4.3701423
ssh_mean 3.8515309
vostr_mean 3.6254884
bathy_sd 1.6481303
uo_mean 1.6317208
mld_mean 1.5744189
vo_mean 1.4637452
pred_var 0.9355801
#explore partial plots
gbm.plot (agi_0m_60m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_Nspat_Ntag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 8122.33
2 10 bathy_mean 2 temp_mean 616.02
3 10 bathy_mean 8 ssh_mean 594.41
4 12 AGI_0m 8 ssh_mean 501.98
5 13 AGI_60m 10 bathy_mean 480.31
6 10 bathy_mean 3 sal_mean 340.70
7 5 uostr_mean 2 temp_mean 239.76
8 9 mld_mean 6 vo_mean 198.10
9 13 AGI_60m 1 chl_mean 194.10
10 13 AGI_60m 3 sal_mean 172.10
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_Nspat_Ntag, dat_test_agi,
n.trees = agi_0m_60m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (agi_0m_60m_Nspat_Ntag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5000 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1969633 0.9206625 0.9883319 1.001621 0.7915479 0.8553011
agi_0m_250m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[1 ])
# model performance
ggBRT:: ggPerformance (agi_0m_250m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862490
Residual.Deviance 0.2101199
Correlation 0.9516769
AUC 0.9969000
Per.Expl 84.8425585
cvDeviance 0.5009849
cvCorrelation 0.8408802
cvAUC 0.9607700
cvPer.Expl 63.8603934
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_250m_Nspat_Ntag)
rel.inf
bathy_mean 29.6981397
temp_mean 19.9110798
AGI_250m 13.1371225
AGI_0m 8.8212041
uostr_mean 6.9872820
sal_mean 4.6639715
ssh_mean 4.1044072
chl_mean 3.5367397
vostr_mean 2.1325830
bathy_sd 1.7771055
uo_mean 1.4966185
mld_mean 1.4421522
vo_mean 1.3513807
pred_var 0.9402136
#explore partial plots
gbm.plot (agi_0m_250m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_250m_Nspat_Ntag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 7413.64
2 12 AGI_0m 8 ssh_mean 379.59
3 8 ssh_mean 4 uo_mean 376.09
4 13 AGI_250m 2 temp_mean 374.64
5 10 bathy_mean 2 temp_mean 352.02
6 13 AGI_250m 10 bathy_mean 349.05
7 10 bathy_mean 3 sal_mean 343.47
8 10 bathy_mean 4 uo_mean 264.93
9 4 uo_mean 2 temp_mean 209.75
10 5 uostr_mean 1 chl_mean 166.77
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_250m_Nspat_Ntag, dat_test_agi,
n.trees = agi_0m_250m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_250m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (agi_0m_250m_Nspat_Ntag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4600 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2000762 0.9178963 0.9878278 0.9987535 0.7871282 0.8484256
agi_0m_60m_250m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[2 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_250m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862490
Residual.Deviance 0.1855561
Correlation 0.9598511
AUC 0.9980000
Per.Expl 86.6145154
cvDeviance 0.4907170
cvCorrelation 0.8451783
cvAUC 0.9623500
cvPer.Expl 64.6010907
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_250m_Nspat_Ntag)
rel.inf
bathy_mean 29.9734289
temp_mean 20.3673209
AGI_250m 12.2941229
AGI_0m 8.4707152
uostr_mean 6.0490757
sal_mean 3.9485657
AGI_60m 3.5161350
chl_mean 3.5092053
ssh_mean 3.0434925
vostr_mean 2.0985932
bathy_sd 1.6409157
vo_mean 1.4220106
uo_mean 1.4121966
mld_mean 1.4016687
pred_var 0.8525533
#explore partial plots
gbm.plot (agi_0m_60m_250m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_250m_Nspat_Ntag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 5744.72
2 12 AGI_0m 8 ssh_mean 599.57
3 10 bathy_mean 3 sal_mean 441.15
4 10 bathy_mean 2 temp_mean 382.86
5 14 AGI_250m 2 temp_mean 328.55
6 13 AGI_60m 10 bathy_mean 279.25
7 14 AGI_250m 10 bathy_mean 268.58
8 10 bathy_mean 8 ssh_mean 257.93
9 10 bathy_mean 4 uo_mean 232.67
10 12 AGI_0m 3 sal_mean 185.87
11 12 AGI_0m 10 bathy_mean 173.22
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_250m_Nspat_Ntag, dat_test_agi,
n.trees = agi_0m_60m_250m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_250m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (agi_0m_60m_250m_Nspat_Ntag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5050 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1883042 0.927851 0.9903983 1.000388 0.8077969 0.8661452
agi_0m_60m_250m_Yspat_Ntag <- readRDS (brt_outputs_Ntag[3 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_250m_Yspat_Ntag)
Model 1
Total.Deviance 1.3862490
Residual.Deviance 0.1679933
Correlation 0.9648664
AUC 0.9986000
Per.Expl 87.8814450
cvDeviance 0.4631752
cvCorrelation 0.8543076
cvAUC 0.9664000
cvPer.Expl 66.5878820
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_250m_Yspat_Ntag)
rel.inf
dist_coast 51.3624256
lat 9.3914556
AGI_0m 6.7567291
bathy_mean 5.5060080
temp_mean 4.8081540
AGI_250m 4.4615978
chl_mean 3.1022312
sal_mean 3.0090800
AGI_60m 2.3105716
ssh_mean 1.7128404
vostr_mean 1.2870285
mld_mean 1.2286325
uo_mean 1.2228660
uostr_mean 1.1546359
vo_mean 0.9892946
bathy_sd 0.9517202
pred_var 0.7447290
#explore partial plots
gbm.plot (agi_0m_60m_250m_Yspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_250m_Yspat_Ntag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 2706.06
2 13 AGI_0m 1 lat 480.87
3 13 AGI_0m 11 bathy_mean 300.50
4 6 uostr_mean 1 lat 293.87
5 11 bathy_mean 2 chl_mean 273.68
6 11 bathy_mean 3 temp_mean 269.12
7 16 AGI_250m 11 bathy_mean 265.48
8 14 dist_coast 10 mld_mean 219.03
9 11 bathy_mean 9 ssh_mean 209.13
10 13 AGI_0m 9 ssh_mean 176.75
11 11 bathy_mean 4 sal_mean 164.47
12 3 temp_mean 1 lat 147.70
13 11 bathy_mean 5 uo_mean 125.32
14 11 bathy_mean 1 lat 120.91
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_250m_Yspat_Ntag, dat_test_agi,
n.trees = agi_0m_60m_250m_Yspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_250m_Yspat_Ntag)
#eval 75/25
eval_7525_modified (agi_0m_60m_250m_Yspat_Ntag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5050 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1793447 0.9348399 0.9915013 0.9995818 0.8220452 0.8788144
Summary table of results
output_sum_Ntag <- read.csv (here ("data/brt/mod_outputs/brt_bckg_output_summary_Ntag.csv" ))
kableExtra:: kable (output_sum_Ntag)
base_0m_Nspat_Ntag
79.360
0.375
0.740
0.872
0.980
0.228
0.891
0.980
0.999
0.794
do_0m_Nspat_Ntag
84.210
0.316
0.743
0.899
0.986
0.208
0.911
0.986
0.998
0.842
do_0m_Yspat_Ntag
85.930
0.286
0.744
0.907
0.988
0.196
0.921
0.988
0.997
0.859
do_0m_60m_Nspat_Ntag
84.480
0.305
0.743
0.901
0.986
0.204
0.914
0.986
0.996
0.844
do_0m_250m_Nspat_Ntag
83.160
0.323
0.742
0.890
0.985
0.212
0.907
0.985
0.997
0.832
do_0m_60m_250m_Nspat_Ntag
85.800
0.292
0.744
0.907
0.987
0.198
0.919
0.987
0.999
0.858
do_0m_60m_250m_Yspat_Ntag
87.650
0.261
0.745
0.922
0.990
0.185
0.930
0.990
0.997
0.877
agi_0m_Nspat_Ntag
82.000
0.327
0.742
0.890
0.985
0.212
0.908
0.985
0.999
0.820
agi_0m_Yspat_Ntag
85.130
0.285
0.745
0.907
0.988
0.196
0.921
0.988
0.999
0.851
agi_0m_60m_Nspat_Ntag
85.530
0.289
0.744
0.902
0.988
0.197
0.921
0.988
1.000
0.855
agi_0m_250m_Nspat_Ntag
84.843
0.295
0.744
0.901
0.988
0.200
0.918
0.988
0.999
0.848
agi_0m_60m_250m_Nspat_Ntag
86.610
0.266
0.745
0.912
0.990
0.188
0.928
0.990
1.000
0.866
agi_0m_60m_250m_Yspat_Ntag
87.880
0.247
0.746
0.923
0.992
0.179
0.935
0.992
1.000
0.879
ggplot (output_sum_Ntag, aes (AUC, TSS, color = percent_explained, label = model)) +
geom_point (size = 3 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' )
Base models w/o tag ID and w/ data at seasonal and annual resolutions
For these models, the environmental raster data was averaged according to season and year. Observed and pseudo absence locations were then used for environmental data extraction along these raster files and were matched to each file according to either the season or year.
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_base_0m_seas_Nspat_Ntag.rds" ,
test_data = dat_test_base_s)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862930
Residual.Deviance 0.3717943
Correlation 0.8912655
AUC 0.9811000
Per.Expl 73.1806827
cvDeviance 0.5434223
cvCorrelation 0.8203757
cvAUC 0.9544500
cvPer.Expl 60.8003302
[1] "Relative influence of predictor variables"
rel.inf
vo_mean 37.6115543
vostr_mean 12.9884480
bathy_mean 9.2018678
uostr_mean 9.0931029
ssh_mean 8.2555616
sal_mean 6.5148137
temp_mean 5.1868409
mld_mean 4.0654653
chl_mean 2.9151965
uo_mean 2.0919827
bathy_sd 1.2334844
pred_var 0.8416821
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 sal_mean 1 mld_mean 907.84
2 10 bathy_mean 6 uostr_mean 469.57
3 8 vostr_mean 4 temp_mean 379.84
4 10 bathy_mean 2 sal_mean 261.59
5 7 vo_mean 5 uo_mean 219.43
6 3 ssh_mean 2 sal_mean 152.45
7 8 vostr_mean 7 vo_mean 141.19
[1] "Calculated percent deviance"
[1] 0.4192523
[1] "TPR"
[1] 0.7364798
[1] "TSS"
[1] 0.8396551
[1] "Percent deviance calculated by in house functions"
[1] 73.18068
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2454028 0.8722718 0.9737081 1.003879 0.6974331 0.7318068
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_base_0m_ann_Nspat_Ntag.rds" ,
test_data = dat_test_base_a)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862892
Residual.Deviance 0.3209531
Correlation 0.9119722
AUC 0.9875000
Per.Expl 76.8480398
cvDeviance 0.5356092
cvCorrelation 0.8241442
cvAUC 0.9553500
cvPer.Expl 61.3638176
[1] "Relative influence of predictor variables"
rel.inf
vo_mean 38.8075023
vostr_mean 16.2673317
uostr_mean 11.4492219
bathy_mean 10.1191174
chl_mean 5.6156607
sal_mean 4.5053210
temp_mean 3.5987576
ssh_mean 2.6590283
mld_mean 2.5230017
uo_mean 2.0857363
bathy_sd 1.4172504
pred_var 0.9520707
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 vostr_mean 6 uostr_mean 1285.85
2 3 ssh_mean 1 mld_mean 541.81
3 8 vostr_mean 3 ssh_mean 540.45
4 7 vo_mean 6 uostr_mean 534.47
5 8 vostr_mean 1 mld_mean 376.41
6 8 vostr_mean 4 temp_mean 354.21
7 10 bathy_mean 8 vostr_mean 334.03
[1] "Calculated percent deviance"
[1] 0.3774674
[1] "TPR"
[1] 0.7394418
[1] "TSS"
[1] 0.8624481
[1] "Percent deviance calculated by in house functions"
[1] 76.84804
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6300 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2307242 0.8885542 0.9793393 1.004532 0.7277093 0.7684804
DO models w/o tag ID and w/ data at seasonal and annual resolutions
Seasonal, Nspat, Ntag Seasonal, Yspat, Ntag Annual, Nspat, Ntag Annual, Yspat, Ntag Daily, seasonal, and Annual, Nspat, Ntag Daily, Seasonal, and Annual, Yspat, Ntag
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_do_0m_60m_250m_seas_Nspat_Ntag.rds" ,
test_data = dat_test_do_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.2418172
Correlation 0.9401701
AUC 0.9946000
Per.Expl 82.5564495
cvDeviance 0.4926303
cvCorrelation 0.8411094
cvAUC 0.9618100
cvPer.Expl 64.4639732
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m_seas 29.9550272
bathy_mean 28.1716202
o2_mean_250m_seas 13.6726213
o2_mean_60m_seas 7.3518430
chl_mean 3.5030980
ssh_mean 3.2048426
temp_mean 3.0233286
sal_mean 2.5361519
bathy_sd 1.5804684
uostr_mean 1.4283298
mld_mean 1.3660243
vostr_mean 1.2743367
uo_mean 1.2458839
vo_mean 0.9471799
pred_var 0.7392442
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 14 o2_mean_60m_seas 3 sal_mean 286.26
2 10 bathy_mean 4 uo_mean 245.71
3 13 o2_mean_0m_seas 8 ssh_mean 230.00
4 15 o2_mean_250m_seas 10 bathy_mean 227.68
5 10 bathy_mean 1 chl_mean 221.03
6 10 bathy_mean 3 sal_mean 201.17
7 14 o2_mean_60m_seas 10 bathy_mean 188.40
8 13 o2_mean_0m_seas 2 temp_mean 171.51
9 13 o2_mean_0m_seas 10 bathy_mean 157.14
10 14 o2_mean_60m_seas 2 temp_mean 154.23
11 10 bathy_mean 8 ssh_mean 138.67
[1] "Calculated percent deviance"
[1] 0.3103337
[1] "TPR"
[1] 0.7433213
[1] "TSS"
[1] 0.8939959
[1] "Percent deviance calculated by in house functions"
[1] 82.55645
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7100 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2061117 0.9125669 0.9864388 1.001678 0.7761335 0.8255645
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_do_0m_60m_250m_seas_Yspat_Ntag.rds" ,
test_data = dat_test_do_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.2287664
Correlation 0.9445079
AUC 0.9954000
Per.Expl 83.4978693
cvDeviance 0.4764336
cvCorrelation 0.8476000
cvAUC 0.9640900
cvPer.Expl 65.6323258
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 51.6082639
o2_mean_0m_seas 11.3961720
o2_mean_250m_seas 7.7265238
o2_mean_60m_seas 4.5488533
bathy_mean 3.9480523
lat 3.9238771
chl_mean 3.0456122
temp_mean 2.7586462
sal_mean 2.2400707
ssh_mean 1.9081300
mld_mean 1.2771601
vostr_mean 1.1909067
uo_mean 1.1179762
bathy_sd 0.9683763
uostr_mean 0.8761205
vo_mean 0.8223314
pred_var 0.6429273
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 17 o2_mean_250m_seas 11 bathy_mean 272.35
2 11 bathy_mean 3 temp_mean 270.80
3 11 bathy_mean 2 chl_mean 265.35
4 6 uostr_mean 1 lat 196.73
5 16 o2_mean_60m_seas 11 bathy_mean 181.67
6 15 o2_mean_0m_seas 1 lat 175.06
7 16 o2_mean_60m_seas 4 sal_mean 175.02
8 15 o2_mean_0m_seas 3 temp_mean 167.89
9 11 bathy_mean 4 sal_mean 158.34
10 11 bathy_mean 5 uo_mean 157.94
11 11 bathy_mean 9 ssh_mean 144.56
12 16 o2_mean_60m_seas 3 temp_mean 142.99
13 13 dist_coast 10 mld_mean 142.57
14 13 dist_coast 8 vostr_mean 130.50
[1] "Calculated percent deviance"
[1] 0.2975476
[1] "TPR"
[1] 0.744045
[1] "TSS"
[1] 0.8977673
[1] "Percent deviance calculated by in house functions"
[1] 83.49787
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7200 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2013708 0.916764 0.9875883 1.000038 0.785357 0.8349787
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_ann_Nspat_Ntag.rds" ,
test_data = dat_test_do_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.2658281
Correlation 0.9337458
AUC 0.9934000
Per.Expl 80.8244119
cvDeviance 0.5297688
cvCorrelation 0.8266086
cvAUC 0.9564300
cvPer.Expl 61.7849717
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 27.9042605
o2_mean_0m_ann 22.5395832
o2_mean_250m_ann 14.4483395
temp_mean 7.5881348
o2_mean_60m_ann 6.8540577
chl_mean 4.2914489
sal_mean 3.5883053
ssh_mean 2.8955778
uostr_mean 1.9748209
vostr_mean 1.6451343
mld_mean 1.5488336
bathy_sd 1.4989186
uo_mean 1.4097996
vo_mean 1.0829564
pred_var 0.7298289
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 14 o2_mean_60m_ann 2 temp_mean 391.78
2 10 bathy_mean 4 uo_mean 212.96
3 10 bathy_mean 2 temp_mean 170.12
4 10 bathy_mean 1 chl_mean 153.06
5 14 o2_mean_60m_ann 10 bathy_mean 151.00
6 10 bathy_mean 3 sal_mean 147.84
7 10 bathy_mean 8 ssh_mean 145.21
8 8 ssh_mean 5 uostr_mean 126.87
9 15 o2_mean_250m_ann 13 o2_mean_0m_ann 126.07
10 14 o2_mean_60m_ann 13 o2_mean_0m_ann 111.62
11 15 o2_mean_250m_ann 2 temp_mean 111.47
[1] "Calculated percent deviance"
[1] 0.3318456
[1] "TPR"
[1] 0.7424093
[1] "TSS"
[1] 0.8858185
[1] "Percent deviance calculated by in house functions"
[1] 80.82441
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7200 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2135096 0.9062035 0.9851365 1.000257 0.7606153 0.8082441
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_ann_Yspat_Ntag.rds" ,
test_data = dat_test_do_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.2401510
Correlation 0.9422127
AUC 0.9951000
Per.Expl 82.6766351
cvDeviance 0.5078189
cvCorrelation 0.8345401
cvAUC 0.9599600
cvPer.Expl 63.3683366
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 52.4405492
lat 7.1991001
o2_mean_250m_ann 5.9536531
o2_mean_0m_ann 5.8598880
bathy_mean 4.8365615
chl_mean 3.9782780
temp_mean 3.5566644
o2_mean_60m_ann 3.4031130
sal_mean 2.8806083
ssh_mean 2.2956462
vostr_mean 1.4221959
mld_mean 1.2744067
uo_mean 1.2559427
bathy_sd 1.0945539
uostr_mean 0.9547095
vo_mean 0.9419010
pred_var 0.6522284
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 515.53
2 16 o2_mean_60m_ann 11 bathy_mean 370.15
3 11 bathy_mean 2 chl_mean 340.14
4 6 uostr_mean 1 lat 262.36
5 11 bathy_mean 5 uo_mean 188.30
6 15 o2_mean_0m_ann 1 lat 171.06
7 16 o2_mean_60m_ann 3 temp_mean 166.92
8 11 bathy_mean 9 ssh_mean 157.85
9 8 vostr_mean 5 uo_mean 143.28
10 16 o2_mean_60m_ann 1 lat 124.98
11 17 o2_mean_250m_ann 1 lat 118.68
12 13 dist_coast 10 mld_mean 103.23
13 3 temp_mean 1 lat 101.93
14 11 bathy_mean 4 sal_mean 93.56
[1] "Calculated percent deviance"
[1] 0.3027324
[1] "TPR"
[1] 0.7439359
[1] "TSS"
[1] 0.9025282
[1] "Percent deviance calculated by in house functions"
[1] 82.67664
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7700 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2024333 0.9163195 0.9879827 0.9982252 0.7816168 0.8267664
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds" ,
test_data = dat_test_do_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.1835771
Correlation 0.9599527
AUC 0.9980000
Per.Expl 86.7576132
cvDeviance 0.4558125
cvCorrelation 0.8568120
cvAUC 0.9668800
cvPer.Expl 67.1198347
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 26.6902874
o2_mean_0m 20.7771528
o2_mean_250m_seas 9.4205590
o2_mean_0m_seas 8.9795725
o2_mean_60m_seas 5.5714036
o2_mean_250m_ann 3.9953438
o2_mean_0m_ann 3.5155437
chl_mean 2.7120000
temp_mean 2.4568315
o2_mean_60m_ann 2.2287795
ssh_mean 2.1615954
sal_mean 1.9774404
o2_mean_60m 1.7098348
o2_mean_250m 1.5994560
bathy_sd 1.0616884
mld_mean 1.0452667
vostr_mean 1.0366101
uo_mean 0.9321002
uostr_mean 0.8335454
vo_mean 0.7263540
pred_var 0.5686348
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 18 o2_mean_250m_seas 11 bathy_mean 337.89
2 3 temp_mean 1 o2_mean_0m 274.53
3 13 o2_mean_60m 11 bathy_mean 237.43
4 20 o2_mean_60m_ann 11 bathy_mean 208.42
5 18 o2_mean_250m_seas 14 o2_mean_250m 204.43
6 11 bathy_mean 3 temp_mean 201.05
7 11 bathy_mean 4 sal_mean 176.86
8 16 o2_mean_0m_seas 12 bathy_sd 168.86
9 11 bathy_mean 2 chl_mean 160.33
10 11 bathy_mean 5 uo_mean 157.91
11 17 o2_mean_60m_seas 4 sal_mean 148.67
12 2 chl_mean 1 o2_mean_0m 118.15
13 19 o2_mean_0m_ann 4 sal_mean 117.62
14 16 o2_mean_0m_seas 9 ssh_mean 116.45
15 19 o2_mean_0m_ann 3 temp_mean 113.55
16 11 bathy_mean 9 ssh_mean 112.36
17 20 o2_mean_60m_ann 17 o2_mean_60m_seas 90.36
18 19 o2_mean_0m_ann 17 o2_mean_60m_seas 82.73
19 17 o2_mean_60m_seas 11 bathy_mean 82.57
20 9 ssh_mean 4 sal_mean 80.14
21 20 o2_mean_60m_ann 3 temp_mean 76.93
22 16 o2_mean_0m_seas 4 sal_mean 69.88
[1] "Calculated percent deviance"
[1] 0.2450293
[1] "TPR"
[1] 0.7464499
[1] "TSS"
[1] 0.928109
[1] "Percent deviance calculated by in house functions"
[1] 86.75761
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8300 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1783497 0.9358277 0.9920179 1.000986 0.8232424 0.8675761
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds" ,
test_data = dat_test_do_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.1743310
Correlation 0.9628367
AUC 0.9983000
Per.Expl 87.4245846
cvDeviance 0.4469638
cvCorrelation 0.8599570
cvAUC 0.9680700
cvPer.Expl 67.7581364
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 49.3246938
o2_mean_0m 8.2391043
o2_mean_0m_seas 5.1503728
o2_mean_250m_seas 5.0655830
lat 3.5620562
bathy_mean 3.3679359
o2_mean_60m_seas 3.0915358
o2_mean_250m_ann 2.7603353
chl_mean 2.5636051
temp_mean 2.2305884
sal_mean 1.9585958
o2_mean_60m 1.6960502
o2_mean_250m 1.6332495
o2_mean_60m_ann 1.5901724
ssh_mean 1.4767219
vostr_mean 0.9417095
mld_mean 0.9346010
o2_mean_0m_ann 0.9281548
uo_mean 0.8093395
bathy_sd 0.7440029
uostr_mean 0.7288300
vo_mean 0.6799355
pred_var 0.5228264
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 22 o2_mean_60m_ann 12 bathy_mean 511.25
2 12 bathy_mean 3 chl_mean 318.95
3 20 o2_mean_250m_seas 12 bathy_mean 313.81
4 20 o2_mean_250m_seas 16 o2_mean_250m 313.27
5 15 o2_mean_60m 12 bathy_mean 281.36
6 12 bathy_mean 4 temp_mean 246.45
7 4 temp_mean 2 o2_mean_0m 224.25
8 12 bathy_mean 5 sal_mean 156.85
9 14 dist_coast 9 vostr_mean 148.54
10 14 dist_coast 11 mld_mean 98.98
11 19 o2_mean_60m_seas 5 sal_mean 96.21
12 2 o2_mean_0m 1 lat 92.41
13 7 uostr_mean 1 lat 92.40
14 21 o2_mean_0m_ann 1 lat 88.81
15 3 chl_mean 2 o2_mean_0m 88.02
16 23 o2_mean_250m_ann 20 o2_mean_250m_seas 84.94
17 12 bathy_mean 10 ssh_mean 77.14
18 12 bathy_mean 6 uo_mean 76.75
19 15 o2_mean_60m 4 temp_mean 75.81
20 9 vostr_mean 7 uostr_mean 66.87
21 12 bathy_mean 1 lat 64.02
22 9 vostr_mean 2 o2_mean_0m 63.47
23 18 o2_mean_0m_seas 13 bathy_sd 60.77
24 18 o2_mean_0m_seas 1 lat 59.88
25 19 o2_mean_60m_seas 12 bathy_mean 58.55
26 15 o2_mean_60m 6 uo_mean 55.64
[1] "Calculated percent deviance"
[1] 0.237065
[1] "TPR"
[1] 0.7468409
[1] "TSS"
[1] 0.9323182
[1] "Percent deviance calculated by in house functions"
[1] 87.42458
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8450 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1745892 0.9385744 0.9923763 0.9989385 0.8289876 0.8742458
AGI models w/o tag ID and w/ data at seasonal and annual resolutions
Seasonal, Nspat, Ntag Seasonal, Yspat, Ntag Annual, Nspat, Ntag Annual, Yspat, Ntag Daily, seasonal, and Annual, Nspat, Ntag Daily, Seasonal, and Annual, Yspat, Ntag
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_agi_0m_60m_250m_seas_Nspat_Ntag.rds" ,
test_data = dat_test_agi_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.2321944
Correlation 0.9450501
AUC 0.9958000
Per.Expl 83.2505563
cvDeviance 0.5034843
cvCorrelation 0.8378703
cvAUC 0.9602500
cvPer.Expl 63.6809390
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 28.4568141
temp_mean 20.6998026
AGI_250m_seas 15.6249042
uostr_mean 5.7123417
AGI_0m_seas 5.2282459
sal_mean 4.9112847
AGI_60m_seas 4.1076147
chl_mean 3.4420183
ssh_mean 3.1190152
bathy_sd 1.8936717
vostr_mean 1.7388308
mld_mean 1.6483372
uo_mean 1.4582145
vo_mean 1.3148051
pred_var 0.6440993
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 3 sal_mean 771.03
2 15 AGI_250m_seas 2 temp_mean 526.27
3 10 bathy_mean 2 temp_mean 384.13
4 13 AGI_0m_seas 6 vo_mean 265.66
5 6 vo_mean 3 sal_mean 224.98
6 14 AGI_60m_seas 2 temp_mean 186.12
7 15 AGI_250m_seas 3 sal_mean 173.26
8 15 AGI_250m_seas 10 bathy_mean 168.54
9 13 AGI_0m_seas 3 sal_mean 157.30
10 2 temp_mean 1 chl_mean 147.38
11 14 AGI_60m_seas 10 bathy_mean 126.45
[1] "Calculated percent deviance"
[1] 0.3053554
[1] "TPR"
[1] 0.7435394
[1] "TSS"
[1] 0.9009719
[1] "Percent deviance calculated by in house functions"
[1] 83.25056
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8500 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.203208 0.9152552 0.9868737 1.000793 0.7797258 0.8325056
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_agi_0m_60m_250m_seas_Yspat_Ntag.rds" ,
test_data = dat_test_agi_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.2237274
Correlation 0.9471479
AUC 0.9962000
Per.Expl 83.8613221
cvDeviance 0.4860336
cvCorrelation 0.8436767
cvAUC 0.9623400
cvPer.Expl 64.9397517
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 52.5815059
lat 6.9593938
AGI_250m_seas 6.5692316
temp_mean 5.2945894
bathy_mean 4.9780355
AGI_0m_seas 4.5385997
AGI_60m_seas 3.7517941
chl_mean 3.3517654
sal_mean 3.2690448
ssh_mean 1.7173600
mld_mean 1.3612586
vostr_mean 1.2099019
uo_mean 1.1259297
uostr_mean 0.9964419
bathy_sd 0.8685392
vo_mean 0.8486002
pred_var 0.5780083
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 444.33
2 13 dist_coast 10 mld_mean 334.59
3 17 AGI_250m_seas 11 bathy_mean 268.59
4 15 AGI_0m_seas 4 sal_mean 177.64
5 13 dist_coast 8 vostr_mean 164.30
6 15 AGI_0m_seas 7 vo_mean 156.50
7 16 AGI_60m_seas 11 bathy_mean 150.04
8 11 bathy_mean 9 ssh_mean 139.48
9 3 temp_mean 1 lat 135.73
10 4 sal_mean 1 lat 133.75
11 11 bathy_mean 2 chl_mean 131.79
12 6 uostr_mean 1 lat 125.32
13 13 dist_coast 1 lat 106.60
14 15 AGI_0m_seas 2 chl_mean 84.71
[1] "Calculated percent deviance"
[1] 0.2881212
[1] "TPR"
[1] 0.7445508
[1] "TSS"
[1] 0.9054166
[1] "Percent deviance calculated by in house functions"
[1] 83.86132
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7950 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1967264 0.9209808 0.9886739 1.001141 0.792158 0.8386132
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_ann_Nspat_Ntag.rds" ,
test_data = dat_test_agi_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.2613223
Correlation 0.9358527
AUC 0.9941000
Per.Expl 81.1493964
cvDeviance 0.5379650
cvCorrelation 0.8231527
cvAUC 0.9551000
cvPer.Expl 61.1936563
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 29.3779108
temp_mean 20.9455029
AGI_250m_ann 14.6419280
uostr_mean 6.3928196
sal_mean 5.0720853
AGI_60m_ann 4.5467677
chl_mean 4.0243117
ssh_mean 3.2096623
AGI_0m_ann 2.6759034
bathy_sd 2.0730979
vostr_mean 1.9352694
mld_mean 1.6660524
uo_mean 1.5344903
vo_mean 1.2263531
pred_var 0.6778453
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 3 sal_mean 650.32
2 14 AGI_60m_ann 10 bathy_mean 393.12
3 15 AGI_250m_ann 2 temp_mean 310.21
4 13 AGI_0m_ann 2 temp_mean 299.08
5 10 bathy_mean 2 temp_mean 284.67
6 13 AGI_0m_ann 3 sal_mean 242.75
7 8 ssh_mean 5 uostr_mean 233.15
8 2 temp_mean 1 chl_mean 215.23
9 14 AGI_60m_ann 2 temp_mean 148.02
10 15 AGI_250m_ann 13 AGI_0m_ann 139.98
11 6 vo_mean 3 sal_mean 131.65
[1] "Calculated percent deviance"
[1] 0.3306731
[1] "TPR"
[1] 0.7424086
[1] "TSS"
[1] 0.8853276
[1] "Percent deviance calculated by in house functions"
[1] 81.1494
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2134505 0.9061452 0.9852186 1.002061 0.7614624 0.811494
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_ann_Yspat_Ntag.rds" ,
test_data = dat_test_agi_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.2210721
Correlation 0.9490763
AUC 0.9966000
Per.Expl 84.0528649
cvDeviance 0.5036031
cvCorrelation 0.8359368
cvAUC 0.9606200
cvPer.Expl 63.6723657
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 51.3678090
lat 8.3057941
AGI_60m_ann 6.0633677
bathy_mean 5.4719648
AGI_250m_ann 5.0030563
temp_mean 4.6716351
chl_mean 3.9646719
sal_mean 3.5807263
ssh_mean 2.2066992
AGI_0m_ann 2.0502544
mld_mean 1.3602811
vostr_mean 1.3535883
uo_mean 1.1243457
bathy_sd 1.0313205
uostr_mean 0.9999780
vo_mean 0.8202189
pred_var 0.6242886
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 16 AGI_60m_ann 11 bathy_mean 597.37
2 11 bathy_mean 3 temp_mean 358.97
3 17 AGI_250m_ann 16 AGI_60m_ann 194.96
4 15 AGI_0m_ann 4 sal_mean 191.27
5 4 sal_mean 1 lat 180.01
6 13 dist_coast 10 mld_mean 178.93
7 6 uostr_mean 1 lat 169.73
8 11 bathy_mean 2 chl_mean 167.64
9 11 bathy_mean 9 ssh_mean 156.75
10 3 temp_mean 1 lat 146.44
11 13 dist_coast 8 vostr_mean 136.64
12 2 chl_mean 1 lat 134.35
13 11 bathy_mean 1 lat 126.90
14 16 AGI_60m_ann 4 sal_mean 109.84
[1] "Calculated percent deviance"
[1] 0.2918624
[1] "TPR"
[1] 0.7443891
[1] "TSS"
[1] 0.9018302
[1] "Percent deviance calculated by in house functions"
[1] 84.05286
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8650 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1982769 0.9197381 0.9885987 1.001188 0.7894592 0.8405286
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds" ,
test_data = dat_test_agi_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.1785652
Correlation 0.9621288
AUC 0.9983000
Per.Expl 87.1191229
cvDeviance 0.4548805
cvCorrelation 0.8575571
cvAUC 0.9665200
cvPer.Expl 67.1869914
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 27.2218447
temp_mean 19.1146917
AGI_250m_seas 9.8455447
AGI_0m 6.6126489
uostr_mean 5.3813311
sal_mean 3.7622783
AGI_0m_seas 3.5537907
AGI_250m 3.1561957
AGI_250m_ann 3.0401735
AGI_60m_ann 2.8710090
chl_mean 2.3231281
ssh_mean 2.1705750
AGI_60m_seas 2.1206609
AGI_60m 1.4160122
vostr_mean 1.3316066
bathy_sd 1.3253579
AGI_0m_ann 1.2533802
uo_mean 1.0833600
vo_mean 1.0068085
mld_mean 0.9705674
pred_var 0.4390348
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 3157.47
2 20 AGI_60m_ann 10 bathy_mean 383.13
3 10 bathy_mean 3 sal_mean 293.95
4 19 AGI_0m_ann 16 AGI_0m_seas 285.37
5 19 AGI_0m_ann 10 bathy_mean 269.38
6 16 AGI_0m_seas 6 vo_mean 264.72
7 20 AGI_60m_ann 16 AGI_0m_seas 263.95
8 16 AGI_0m_seas 13 AGI_60m 211.00
9 12 AGI_0m 10 bathy_mean 195.06
10 18 AGI_250m_seas 10 bathy_mean 182.52
11 18 AGI_250m_seas 2 temp_mean 180.05
12 10 bathy_mean 2 temp_mean 157.93
13 4 uo_mean 2 temp_mean 127.37
14 19 AGI_0m_ann 11 bathy_sd 123.67
15 12 AGI_0m 8 ssh_mean 109.90
16 12 AGI_0m 3 sal_mean 101.87
17 16 AGI_0m_seas 4 uo_mean 101.51
18 18 AGI_250m_seas 3 sal_mean 89.27
19 21 AGI_250m_ann 14 AGI_250m 83.72
20 16 AGI_0m_seas 2 temp_mean 76.37
21 21 AGI_250m_ann 3 sal_mean 75.91
22 12 AGI_0m 11 bathy_sd 74.93
[1] "Calculated percent deviance"
[1] 0.2444309
[1] "TPR"
[1] 0.7462479
[1] "TSS"
[1] 0.927292
[1] "Percent deviance calculated by in house functions"
[1] 87.11912
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8800 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1772743 0.936553 0.9916911 0.9996831 0.8236749 0.8711912
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds" ,
test_data = dat_test_agi_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.1715113
Correlation 0.9638951
AUC 0.9985000
Per.Expl 87.6279537
cvDeviance 0.4400447
cvCorrelation 0.8619151
cvAUC 0.9686500
cvPer.Expl 68.2571797
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 50.4410262
lat 7.2678587
AGI_0m 5.6576717
AGI_60m_ann 4.4450574
bathy_mean 4.2815646
temp_mean 3.3935652
AGI_0m_seas 2.9909112
AGI_250m_seas 2.6539915
sal_mean 2.4793934
chl_mean 2.3790803
AGI_60m_seas 2.0670030
AGI_250m_ann 1.7341816
AGI_250m 1.6426522
ssh_mean 1.3367289
AGI_60m 1.1879353
AGI_0m_ann 1.1650316
vostr_mean 0.8987196
uo_mean 0.8292838
mld_mean 0.7767813
uostr_mean 0.7072603
vo_mean 0.6524355
bathy_sd 0.6298252
pred_var 0.3820413
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1024.13
2 22 AGI_60m_ann 11 bathy_mean 547.36
3 13 AGI_0m 1 lat 469.15
4 18 AGI_0m_seas 7 vo_mean 335.18
5 13 AGI_0m 11 bathy_mean 241.70
6 18 AGI_0m_seas 15 AGI_60m 223.49
7 21 AGI_0m_ann 11 bathy_mean 223.23
8 21 AGI_0m_ann 18 AGI_0m_seas 206.43
9 20 AGI_250m_seas 11 bathy_mean 199.82
10 14 dist_coast 8 vostr_mean 186.60
11 22 AGI_60m_ann 18 AGI_0m_seas 159.23
12 6 uostr_mean 1 lat 137.96
13 14 dist_coast 10 mld_mean 120.60
14 13 AGI_0m 6 uostr_mean 92.94
15 15 AGI_60m 11 bathy_mean 82.93
16 11 bathy_mean 3 temp_mean 80.85
17 4 sal_mean 1 lat 80.17
18 11 bathy_mean 2 chl_mean 74.27
19 20 AGI_250m_seas 16 AGI_250m 62.00
20 11 bathy_mean 5 uo_mean 60.05
21 19 AGI_60m_seas 13 AGI_0m 56.78
22 23 AGI_250m_ann 4 sal_mean 53.45
23 11 bathy_mean 9 ssh_mean 51.53
24 18 AGI_0m_seas 4 sal_mean 49.55
25 14 dist_coast 1 lat 47.46
26 3 temp_mean 1 lat 46.71
[1] "Calculated percent deviance"
[1] 0.2369278
[1] "TPR"
[1] 0.7466861
[1] "TSS"
[1] 0.9304143
[1] "Percent deviance calculated by in house functions"
[1] 87.62795
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8650 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1747118 0.9383493 0.9920527 1.000018 0.8290874 0.8762795
Summary table of results
output_sum_seas_ann <- read.csv (here ("data/brt/mod_outputs/brt_background_seas_ann_output_summary.csv" ))
kableExtra:: kable (output_sum)
base_0m_Nspat_Ntag
79.360
0.375
0.740
0.872
0.980
0.228
0.891
0.980
0.998
0.794
base_0m_Nspat_Ytag
91.990
0.182
0.759
0.957
0.994
0.147
0.957
0.994
0.994
0.920
base_0m_Yspat_Ytag
93.500
0.154
0.771
0.965
0.995
0.134
0.964
0.995
0.990
0.935
do_0m_Nspat_Ytag
93.820
0.131
0.769
0.976
0.997
0.120
0.972
0.997
0.998
0.938
do_0m_Yspat_Ytag
94.870
0.114
0.781
0.977
0.998
0.112
0.975
0.998
0.998
0.949
do_0m_60m_Nspat_Ytag
95.007
0.118
0.772
0.978
0.997
0.113
0.975
0.997
0.997
0.950
do_0m_250m_Nspat_Ytag
94.430
0.123
0.777
0.976
0.997
0.117
0.973
0.997
0.997
0.944
do_0m_60m_250m_Nspat_Ytag
95.380
0.113
0.778
0.979
0.997
0.111
0.975
0.997
0.997
0.954
do_0m_60m_250m_Yspat_Ytag
95.590
0.103
0.783
0.978
0.998
0.107
0.977
0.998
0.996
0.956
agi_0m_Nspat_Ytag
93.470
0.143
0.764
0.970
0.996
0.127
0.968
0.996
0.999
0.934
agi_0m_Yspat_Ytag
94.560
0.124
0.776
0.976
0.997
0.117
0.973
0.997
0.996
0.946
agi_0m_60m_Nspat_Ytag
94.111
0.131
0.764
0.971
0.997
0.121
0.971
0.997
0.998
0.941
agi_0m_250m_Nspat_Ytag
93.990
0.137
0.769
0.970
0.996
0.124
0.970
0.996
0.997
0.940
agi_0m_60m_250m_Nspat_Ytag
94.540
0.129
0.768
0.973
0.996
0.118
0.972
0.996
0.998
0.945
agi_0m_60m_250m_Yspat_Ytag
94.780
0.122
0.773
0.974
0.997
0.116
0.973
0.997
0.996
0.948
ggplot (output_sum_seas_ann, aes (AUC, TSS, color = percent_explained, label = model)) +
geom_point (size = 3 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' )